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ABSTRACT 

In this article, we discuss the composite likelihood estimation of sparse Gaussian graph- 
ical models. When there are symmetry constraints on the concentration matrix or partial 
correlation matrix, the likelihood estimation can be computational intensive. The com- 
posite likelihood offers an alternative formulation of the objective function and yields con- 
sistent estimators. When a sparse model is considered, the penalized composite likelihood 
estimation can yield estimates satisfying both the symmetry and sparsity constraints and 
possess ORACLE property. Application of the proposed method is demonstrated through 
simulation studies and a network analysis of a biological data set. 

Key words: Variable selection; model selection; penalized estimation; Gaussian graphical 
model; concentration matrix; partial correlation matrix 

1. Introduction 

A multivariate Gaussian graphical model is also known as covariance selection model. 
The conditional independence relationships between the random variables are equiva- 
lent to specified zeros among the inverse covariance matrix. More exactly, let X = 
(X^\ ...,X^) be a p-dimensional random vector following a multivariate normal distri- 
bution iVp(//, £), with jj, denoting the unknown mean and E denoting the nonsingular 
covariance matrix. Denote the inverse covariance matrix as — C — (Cij)i<i,j< P - Zero 
entries CV,- in the inverse covariance matrix indicate conditional independence between 
the random variables X' 1 ' and X^ given all other variables (Dempster (1972), Whittaker 
(1990), Lauritzen (1996)). The Gaussian random vector X can be represented by an 
undirected graph G = (V, E), where V contains p vertices corresponding to the p coordi- 
nates and the edges E = (%)i<i<j<p represent the conditional dependency relationships 
between variables X® and X^\ It is of interest to identify the correct set of edges, and 
estimate the parameters in the inverse covariance matrix simultaneously. 
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To address this problem, many methods have been developed. In general, there are no 
zero entries in the maximum likelihood estimate, which results in a full graphical structure. 
Dempster (1972) and Edwards (2000) proposed to use penalized likelihood with the L - 
type penalty V\(\ c ij\)i^j — ^H\ c ij\ 0)> where /(.) is the indicator function. Since the L Q 
penalty is discontinuous, the resulting penalized likelihood estimator is unstable. Another 
approach is stepwise forward selection or backward elimination of the edges. However, this 
ignores the stochastic errors inherited in the multiple stages of the procedure (Edwards 
(2000)) and the statistical properties of the method are hard to comprehend. Meinshausen 
and Biihlmann (2006) proposed a computationally attractive method for covariance se- 
lection; it performs the neighborhood selection for each node and combines the results to 
learn the overall graphical structure. Yuan and Lin (2007) proposed penalized likelihood 
methods for estimating the concentration matrix with the L\ penalty (LASSO) (Tibshi- 
rani (1996)). Banerjee, Ghaoui, and D'aspremont (2007) proposed a block-wise updating 
algorithm for the estimation of the inverse covariance matrix. Further in this line, Fried- 
man, Hastie, and Tibshirani (2008) proposed the graphical LASSO algorithm to estimate 
the sparse inverse covariance matrix using the LASSO penalty through a coordinate-wise 
updating scheme. Fan, Feng, and Wu (2009) proposed to estimate the inverse covariance 
matrix using the adaptive LASSO and the Smoothly Clipped Absolute Deviation (SCAD) 
penalty to attenuate the bias problem. Friedman, Hastie and Tibshirani (2012) proposed 
to use composite likelihood based on conditional likelihood to estimate sparse graphical 
models. 

In real applications, there often exists symmetry constraints on the underlying Gaus- 
sian graphical model. For example, genes belong to the same functional or structure 
group may behave in a similar manner and thus share similar network properties. In 
the analysis of high-dimensional data, clustering algorithm is often performed to reduce 
the dimensionality of the data. Variates in the same cluster exhibit similar patterns. 
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This may result in restrictions on the graphical gaussian models: equality among sep- 
cified elements of the concentration matrix or equality emong specific partial variances 
and correlations. Adding symmetry to the graphical model reduces the number of pa- 
rameters. When both sparsity and symmetry exisits, the likelihood estimation becomes 
computationally challenging. 

Hojsgaard and Lauritzen (2009) introduced new types of Guassian models with sym- 
metry constraints. When the restriction is imposed on the inverse convariance matrix, 
the model is referred as RCON model. When the restriction is imposed on the partial 
correlation matrix, the model is referred as RCOR model. Likelihood estimation on both 
models can be obtained through Newton iteration or partial maximization. However, the 
algorithm involves the inversion of concentration matrix in the interation steps, which can 
be computationally costly in the analysis of large matrices. When sparsity constrainst is 
imposed on the RCON and RCOR model, the likelihood is added extra penalty terms on 
the sizes of the edges. Solving the penalized likelihood with both sparsity and symmetry 
constraint is a challenge. In this article, we investigate the alternative way of formu- 
lating the likelihood. We propose to use composite likelihood as our objective function 
and maximize the penalized composite likelihood to obtain the sparse RCON and RCOR 
model. The algorithm is designed based on co-ordinate descent and soft thresholding 
rules. The algorithm is computationally convenient and it avoids any operations of large 
matrix inverison. 

The rest of the article is organized as follows. In Section 2.1 we formulate the penalized 
likelihood function for the RCON and RCOR modle matrix. In Sections 2.2 and 2.3, 
we present the coordinate descent algorithm and soft thresholding rule. In Section 3, we 
investigate the asymptotic behavior of the estimate and establish the ORACLE property of 
the estimate. In Section 4, simulation studies are presented to demonstrate the empirical 
performance of the estimate in terms of estimation and model selection. In Section 5, we 
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applied our method to a clustered microarray data set to estimate the networks between 
the clustered genes and also compare the networks under different treatment settings. 

2. Method 

2.1 Composite likelihood 

The estimation of Gaussian graphical model has been mainly based on likelihood method. 
An alternative method of estimation based on composite likelihood has drawn much at- 
tention in recent years. It has been demonstrated to possess good theoretical properties, 
such as consistency for the parameter estimation, and can be utilized to establish hypoth- 
esis testing procedures. Let x = (xi, . . . ,x n ) T be the vector of n variables observed from 
a single observation. Let {f(x;cf)),x G X, <fi G ^} be a class of parametric models, with 
X C TZ n , C 1Z q , n > 1, and q > 1. For a subset of {1, . . . , n}, say a, x a denotes a 
subvector of x with components indexed by the elements in set a; for instance, given a 
set a = {1,2}, x a = (xi,X2) T - Let = (9,rj), where 9 G C W, p < q, is the parameter 
of interest, and rj is the nuisance parameter. According to Lindsay (1988), the CL of a 
single vector- valued observation is L c (9; x) = Yl a &A x a) Wa , where A is a collection of 
index subsets called the composite sets, L a (9; x a ) = f a (x a ; 9 a ), and {w a , a 6 A} is a set of 
positive weights. Here f a denotes all the different marginal densities and 9 a indicates the 
parameters that are identifiable in the marginal density f a . 

As the composite score function is a linear combination of several valid likelihood score 
functions, it is unbiased under the usual regularity conditions. Therefore, even though 
the composite likelihood is not a real likelihood, the maximum composite likelihood es- 
timate is still consistent for the true parameter. The asymptotic covariance matrix of 
the maximum composite likelihood estimator takes the form of the inverse of the Go- 
dambe information:iJ(^) T J(^)- 1 iJ(^), where H{9) = E{-J2 aeA d 2 \og f(x a ;9)/d9d9 T } 
and J (9) = var{^ agA d\ogf(x a ; 9)/ 89} are the sensitivity matrix and the variability ma- 
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trix, respectively. Readers are referred to Cox and Reid (2004) and Varin (2008) for a 
more detailed discussion on the asymptotic behavior of the maximum composite likelihood 
estimator. 

2.1 Composite likelihood estimation of rcon model 

Let data X consist of n replications of a multivariate random vector of size p: X = 
(Xi,X 2 , . . . , X n ) T , with Xi = (Xn,Xi2, . . . , X ip ) T following a N p (/j,, S) distribution. For 
simplicity of exposition, we assume throughout that /i — 0. We let 9 = denote the 
inverse covariance, also known as the concentration matrix with elements (9ij), 1 < < 
p. The partial correlation between Xij and X ik given all other variables is then 

Pjk — —9jk/ a/ 9jj9 kk . 

It can be shown than 9j k = if and only if X^ and X ik are conditionally independent 
given all other variables. 

There are different symmetry restrictions on cencentrations first introduced by Hojs- 
gaard and Lauritzen (2009). An RCON(V,£) model with vertex coloring V and edge 
coloring £ is obtained by restricting the elements of the concentration matrix 9 as follows: 
1) Diagonal elements of the concentration matrix 9 corresponding to vertices in the same 
vertex colour class must be identical. 2) Off diagonal entries of 9 corresponding to edges 
in the same edge colour class must be identical. Let V = {Vi, . . . , 14}, where V-y, . . . , 14 is 
a partition of {1, ... , p} vertex class. Let £ = {E±, . . . , Ei}, where Ei, . . . , Ei is a partition 
of {(i,j),l<i<j< p} edge class. This implies given an edge color class, for all edges 
€ E s , 9ij are all equal and hence denoted as 9e 3 - This also implies given a vertex 
color class, for all vertices (i) G V m , 9u are all equal and hence denoted as 9y m , <J U are all 
equal and hence denoted as ov m , 

Following the approach of Friedman, Hastie and Tibshirani (2012), we formulate com- 
posite conditional likelihood to estimate sparse graphical model under symmetry con- 
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straints. The conditional distribution of Xij\x-ij = N(^2 k ^j XikPkj, & 33 )-> where X-ij = 
(xa,Xi 2 , ■ ■ . ,Xij-i,x j+ i, . . .,x ip ), (3 kj = -Okj/Qjj, and a 33 = 1/9 jj. The negative compos- 
ite log-likelihood can be formulated as 

m ^\Y,{N\oga 33 + j-.WX, -XBjWl), 

where Bj is a p— vector with elements (3ij, except a zero at the jth position, and B = 
(Bi, B 2 , . . . , Bp). We propose to estimate the sparse RCON model by minimizing the 
following penalized composite loglikelihood Q(9): 

min i c (9)+n\Y\9 Ea \. 

s 

We employ coordinate-descent algorithm by solving the penalized minimization one 
coordinate at a time. It can be shown that the negative expected Hessian matrix of 
£ c (9) is positive definite because it is the sum of expected negative Hessian matrices of all 

conditional likelihoods: 

,—d 2 £ c (9) s-^J^ d 2 l(xij\x_ij) 

E{ ^9*— ) = 2^2^ E ( — W } 

= E E E{E{ m{x ^ \x^)) = EE E{^{ dl{x ^\ x_ l3 )). 

i=i j=i i=i j=i 

Each vax( dl ^ X QQ~^ \x_j) is positive definite and integrals preserve positive definiteness, 

therefore E( d is positive definite. Thus, when n is sufficiently larege, the objective 
function Q{9) is locally convex at 9 . If the interation steps of the algorithm hits this 
neighborhood, the algorithm will converge to 9 . 

The co-ordinate descent algorithm proceeds by updating each parameter of the objec- 
tive function one at a time. The first derivative of the objective function with respect to 
the edge class parameter is as follows. The technical derivation is in the Appendix. 

If =E E E o»xJx t) e E , + 

i=i i;(i,j)eE s i-,(i,j)eE s 

, v \ [ ' 

(E X J( E X ^ + (jjj E E XfX l 9 lj )+nsgn(9 Es ), 

E=l i;(i,j)eE e i;(i,j)eE s l;(l,j)eEc ' 



where E c s = ^ j and ^ E s }. Therefore the update for 9 Es is 

, _ gHZg=i g(g !iM eg f *i) + g Es,(ij) eg . XTmA/n, A) 

(E^iE i;W ) eB .E, ; (, J ) 6B .^^«)/n 

where 5(2:, A) = sign(z)(|2:| — A) + is the soft-thresholding operator. Let C = \X T X 

denote the sample covariance matrix. Given the color edge group E s , we construct the 

edge adjancency matrix T Ea , with T Es = 1, if G E s , and T Es = otherwise. We can 

simplify the updating expression as follows: 

~ S(-tr(T E *C) + tr(T^(T^ c B)C), A) 
Ea ~ tr (T E °{T E ° a) C) ' 

where denotes the componentwise product, and a denotes apxp matrix of diag(cr"). 

For notational convenience, let 9 denote a p x p matrix with diagonal elements equal 
to zero and off-diagonal elements equal to that of 9. The first derivative of Q{9) with 
respect to the vertex class is as follows: 

dQ{6) 



d°v m 

n \ 1 Cjj . 

2 ! au (a-u)! ' >li l 

jev m v ; 



(3) 



where qj = Y^ =1 Y7v=i Cii'Oij&i'j- Therefore the solution of 

-\Vm\ + yJ\V m \ 2 + *(E j ev m Qj)(Y, j ev m C jj ) 
°v m 2T a- ' 

where \V m \ denotes the cardinality of V m . Let diagonal matrix T Vm denote the generator 
for the vertex color class, with Tjp = 1 for j G V m , and Tj™ = otherwise. To simplify 
the notation, we have J2jev m = te(T Vm C), and J2jev m Qj = tr(T Vm 9C9). Because C is 
positive definite, Yljev m 1j > ®- Therefore, the quadratic equation has one unique positive 
root. Alternating the updating scheme throughout all the 9e s , and 9y m until convergence, 
we obtain the penalized sparse estimate of the concentration matrix under RCON model. 

2.2 Estimation of rcor model 



An RCOR (V, S) model with vertex colouring V and edge coloring £ is obtained by 
restricting the elements of 6 as follows: (a) All diagonal elements of 6 (inverse partial 
variances) corresponding to vertices in the same vertex colour class must be identical, 
(b) All partial correlations corresponding to edges in the same edge colour class must be 
identical. Given an edge color class, for all edges G E s , p^ are all equal and hence 
denoted as pe s - This also implies given a vertex color class, for all vertices (i) G V m , On 
are all equal and hence denoted as 9 Vm , and a 11 are all equal and hence denoted as ay m , 
We formulate the composite likelihood in terms of pe 3 and ay m . 

For notational convenience, define apxp matrix p with p^ = p^ for i ^ j and 
Pij = for i = j. Let pj denote the jth column of the matrix p. Define a p-element vector 
od = (c 11 , • • • , <J PP ) T . The composite likelihood is formulated as 

UP, o) = \ X>log^' + - X(pj a-J)(a^\\l}. 

i=i 

We propose to estimate the sparse RCOR model by minimizing the following penalized 
composite loglikelihood Q(p,a): 

min l c (p,cr)+n\y \pe \- 

0E 3 ,l<s<l,O Vm ,l<m<k ' 

s 

The partial derivative of Q(p, a) with respect to the partial correlation is as follows: 

dQ(p,cr) 
dpE 3 

=np E M^- 1/2 T^) T C(a-^ 2 T E ^ + ntr {[a'^p © T E ^) T C(a-^ 2 T E ^ (4) 
-^((Xa-VyXia-WTB'fj +nsgn(0 B J. 
The thresholded estimate of the partial correlation takes the following form: 
5(tr (T E ° (a- i Ca~ 5 ) ) - tr (T E ° (T E <> p)(a~^Ca~^)), A) 



PE a = 



t T (T E '.T E ' (a~ 2(7(7-2)) 
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The partial derivatives with respect to oy m is as follows: 

d£(p,a) 



da Vr 

n 

2 L ov m nal na l v ' n 



(\y™\ _ ^j£V m X J X J ^i£V m ^j€V m 2x I x jPij 2 -| \ - T , 

i„„ nrT 2_ + T7YT _2_ + n a v m 2^ x i x iPv/V a n 

(i,j);ieV m ,j£V m 



I p I p 

- YY Y X I X i'PvPi'3 1" Y Y Y x I x i'PvPi'j/v^}- 



V m j = l idVmi'dVm n(J V m 3 = l idVmi'iVrr, 



Re-express the above expression in terms of y — ^/<Jy m . We solve the equation 

\V m \y 2 -by-a = 0, 



with 



and 



The solution would be 



b+Jb 2 + 4a\V n 

y= — 



(5) 



a = Y x J x i/ n - Y Y 2x J x iPij/ n +YY Y x J x i'PijPi'j/ n 

jeV m ieV m j<=V m j=li£V m i'£V m (g) 

= ti(T Vm C) - 2ti(T Vm CT Vm p)) + tr(pT Vm CT Vm p) 



b = ~YY 2x J x iPij/(nV^) + YY Y x I x i'PijPi'j/(nVa^) 

i&V m j<£V m j=l i£V m i><£Vm (7) 

= -2tr{T v ™Ca- l/2 T v ™p) + tr {pT v ™ a' 1 1 2 CT Vm p). 



2|Kn| 

The positive solution is unique because 

a =ti(c(T Vm - pT Vm ) T {T Vm - pT Vm )^J > 0. (8) 

2.3 Asymptotic properties 

In this section, we discuss the asymptotic properties of the penalized composite like- 
lihood estimates for sparse symmetric Gaussian graphical models. In terms of the choice 
of penalty function, there are many penalty functions available. As the LASSO penalty, 
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Px(\9i\) = A|0;|, increases linearly with the size of its argument, it leads to biases for the 
estimates of nonzero coefficients. To attenuate such estimation biases, Fan and Li (2001) 
proposed the SCAD penalty. The penalty function satisfies p\(Q) = 0, and its first-order 
derivative is 

p'M = A{/(0 < A) + (a ^~^ I(0 > A)}, for 9 > 0, 

where a is some constant, usually set to 3.7 (Fan and Li, 2001), and (t) + = tl(t > 0) 
is the hinge loss function. The SCAD penalty function does not penalize as heavily as 
the Li penalty function on parameters with large values. It has been shown that with 
probabability tending to one, the likelihood estimation with the SCAD penalty not only 
selects the correct set of significant covariates, but also produces parameter estimators 
as efficient as if we know the true underlying sub-model (Fan & Li, 2001). Namely, the 
estimators have the so-called ORACLE property. However, it has not been investigated if 
the oracle property is also enjoyed by composite likelihood estimation of GGM with the 
SCAD penalty. The following discussion is focused on the RCON model but it can be 
easily extended to RCOR model. 

For notational convenience, let z = {E s : 9 Es ^ 0} U V denote all the nonzero edge 
classes and all vertex classes and z c = {E s : 9 Eg = 0} denote all the zero edge classes. 
The parameter vector can be expressed as 9 = (9 El , ■ ■ ■ , 9e v 9y 1 , ■ ■ ■ , 9v k )- Let 9 denote 
the true null value. 

Theorem 1. Given the SCAD penalty function p\(9), if X n — > 0, and ^Jn\ n — > oo as 
n — > oo, then there exist a local maximizer 9 to Q{9) and \ \9 — 9 \ \ = O p {n^). Furthermore, 
we have 

lim P(9 z c = 0) = 1. 

Proof. Consider a ball ||# — #o|| < Mn~^ for some finite M. Applying Taylor Expansion, 
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we obtain: 

dQ{9)/89 ] = d£ c (9)/d9 J - np' An (|^|)sign(^) 

(9) 

= de c (0o)/d9 j+ Yl (0 j/ -9 j/o )d 2 e c (9*)/d9 j 9 r -np' Xn (\9 j \)si ga (9 j ), 
j'e(fuv) 

for j G {£ U V) and some 9* between and O - As E(d£ c (9 )/d9 j ) = 0, de c (9 )/d9j = 
O p (n5). As |0* — 0| < MrT^ and d 2 l c (9*)/d9 j 9 j , = O p (n) componentwise. First we 
consider j G z c . Because lim inf^oolim mf^ 0+ p' Xn ((3) / \ n > 0, and A n — > 0, and y/n\ n — > 
oo as n — >■ oo, the third term dominates the the first two terms. Thus the sign of 
dQ(9)/d9j is completely determined by the sign of (5j. This entails that inside this Mn~ l l 2 
neighborhood of /3 , dQ(9)/d9 j > 0, when 6j < and dQ(9)/d9 j < 0, when 6j > 0. 
Therefore for any local maximizer 9 inside this ball, then 9j = with probability tending 
to one. As P\ n (0) = 0, we obtain 

Q{9) - Q(9 ) = £ c (9) - £ c (9 ) - n £ (p An (|^|) - P xMo\)) 

je(£uv) 



< { 9-9 r d ^ + (9-9 f d ^(9-9 ) 
~ n^(p An (|^o|)sign(^ )(^ - 0jo)+p"x n (\0jo\)(0j ~ M 2 (l + °(1))Y 

(10) 

For n large enough and 9 j0 ^ 0, p' x {\9 j0 \) = and p x (\9 j0 \) = 0. Furthermore, d 2 £ c {9*)/d9 2 
converges to H{9) in probability, which is negative definite. Thus, we have Q{9) < Q(9 ) 
with probability tending to one for 9 on the unit ball. This implies there exists a local 
maximizer of 9 such that \9 — 9 \ = O p (n -1 / 2 ). □ 

Next, we establish the asymptotic distribution of the estimator 9. Let 9 Z denote the 
sub-vector of nonzero parameters in 9. Define a matrix Si = diag{pj' A i(0jo); j £ z}, and 
a vector b\ = (p' Xn (9j)sign(9j ); j G z). Let H zz denote the sub-matrix of H(9) and V zz 
denote the sub-matrix of V(9) corresponding to the subset of z. 
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Theorem 2. Given the SCAD penalty function p\(0), if X n — > and y/n\ n — > oo, as n — > 
oo, then the sub-vector of the root-n consistent estimator 9 Z has the following asymptotic 
distribution: 

V^{H ZZ + Ei){0 z - 9 z0 + (H zz + Si)- 1 ^} -)■ AT{0, \4 2 }, asn -> oo. 

Proof. Based on Taylor expansion presented in Proof to Theorem 1, we have 

dQ{9) dl c (6 ) dH c (9*) ^ - 
= = + -MjerQ, - ^o) - nfc - n(Ei + (1))(6>, - (11) 

As # — )> # hi probability, ~ ~g e ^ — -^zz in probability. The limiting distribution of 
is N{0, V zz }. According to Slutsky's theorem, we h&vey/n(H zz + Y, 1 ){9 Z - 9 z0 + 
(^« + E 1 )- 1 6 1 }^JV{0,y«)}. □ 

Next we discuss the estimation of the Hessian matrix if Z2 and the variability matrix 
V zz . As the second differentiation is easy to calculate, we obtain H zz = d 2 £ c (9)/d9 z d9j\g. 
The variability matrix based on sample covariance matrix of the composite score vectors 
is computationally harder as we need to compute the composite score vector for each 
observation, where the number of observations can be large. Alternatively, we perform 
bootstrap to obtain the 

1 m 

v zz = ^—^ - W(s {m) 0) - s), 

where S(9) = d£ c (9)/d9 z , S^ m \9) denotes the score vector evaluated with the composite 
estimator obtained from the original sample and the data from the mth bootstrap sample 
and S = Y^iLi S^ m \0)/m. In pratice, we only need a moderate number of bootstrap 
samples to obtain V zz . 

3. Numerical analysis 

We analyze the "math" data set from Mardia et al. (1979), which consists of 88 
students in 5 different mathematics subjects: Mechanics (me), Vectors (ve), Algebra (al), 
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Analysis (an) and Statistics (st). The model with symmetry proposed by Hojsgaard and 
Lauritzen (2008) has vertex color classes {al}, {me, st}, {ve, an} and edge color classes 
{(al,an)}, {(an,st)}, {(me,ve), (me,al)}, and {(ve,al), (al,st)}. We perform composite 
likelihood estimation on this symmetric model with no penalty imposed on the parameters. 
In Table 1, the composite likelihood estimates and their standard deviations calculated 
through bootstraps are compared with those obtained by maximum likelihood estimator 
and a naive estimator. The naive estimator estimates the edge class parameters and 
vertex class parameters by simply averaging all the values belonging to the same class in 
the inverse sample covariance matrix. All three methods yield results that are very close 
to each other. 

Next we examine the performance of the unpenalized composite likelihood estimator 
on large matrices. First we consider the RCON model. We simulate under different sce- 
narios with n varying from 250 to 1000 and p varying from 40, 60 to 100. We include 30 
different edge classes and 20 different vertex classes. We simulate a sparse matrix with 
0e = (0 25 , 0.2591, 0.1628, -0.1934, 0.0980, 0.0518), and 6 V = (1.3180,1.8676,1.788004, 
1.7626, 1.6550, 1.1538, 1.3975, 1.7877, 1.7090, 1.6931, 1.46313, 1.5131, 1.7084, 1.7344, 1.1441, 
1.8059,1.7446, 1.8522, 1.3146, 1.1001), where P denotes a zero vector of length p. The 
number of nonzero edges ranges from about 250 to 1640. In Table 2, we compare the sum 
of squared errors of the composite likelihood estimates with the naive estimates from 100 
simulated data sets. The proposed composite likelihood estimates consistenly enjoy much 
smaller sum of squared errors across all settings. 

We also investigate the empirical performance of the proposed composite likelihood 
estimator under the RCOR model. We simulate under different scenarios with n varying 
from 250 to 1000 and p varying from 40, 60 to 100. We include 30 different edge classes and 
20 different vertex classes. We simulate a sparse matrix with p £ = (0 2 6, 0.1628, —0.1534, 
0.0980,0.0518) and 9 V = (3.0740,3.6966,3.7772,3.5475, 3.2841,3.4699,3.7235, 3.5987, 
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3.3313, 3.8183, 3.9236, 3.9008, 3.9011, 3.0470, 3.0139, 3.2072, 3.8438, 3.4823, 3.9373, 3.0125.) 
In table 3, we provide the errors \\pg — ps\\i an d Hv^v — v^vlh for the composite like- 
lihood estimates and the naive estimates from 100 simulated data sets. With regard to 
the estimated partial correlations, the composite likelihood estimates yield consistently 
smaller errors compared to the naive estimates. With regard to the conditional standard 
deviations, the composite likelihood estimates yield slightly larger errors under sample 
size n = 250, and n = 500. With sample size n = 1000, the composite likelihood estimates 
have smaller errors than the naive estimates. For example, with p = 100 and the number 
of true edges close to 1300, the naive estimate for the conditional standard deviation has 
error 1.8116, while the composite likelihood estimate has error 0.2923. 

We further examine the empirical performance of the penalized composite likelihood 
estimator. We simulate the RCON model using the same settings as of Table 1. We 
consider different scenarios with n = 250 orn = 500, and p = 40 or p = 60. We use the 
penalized composite likelihood estimator to estimate the sparse matrix. The tuning pa- 
rameter is selected by composite BIC, which is similar to BIC with the first term replaced 
by the composite likelihood evaluated at the penalized composite likelihood estimates. 
For each setting, 100 simulated data sets are generated and for each data we calculate the 
number of false negatives and false positives. In Table 4, it is shown that the proposed 
method has satisfactory model selection property with very low false negative and false 
positive results. For example, with n = 500 and p = 60, each simulated data set has 
an average number of 1474 zero edges and 325 nonzero edges. The proposed method 
identifies an average of zero false negative result and 0.58 false positive result. The size 
of the tuning parameters is also listed in Table 4. 

5. Application 

We apply the proposed method on a real biological data set. The experiment was con- 
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ducted to examine how GM-CSF modulates global changes in neutrophil gene expressions 
(Kobayashi et al, 2005). Time course summary PMNs were isolated from venous blood 
of healthy individuals. Human PMNs (107) were cultured with and without 100 ng/ml 
GM-CSF for up to 24 h. The Experiment was performed in triplicate, using PMNs from 
three healthy individuals for each treatment. There are in total 12625 genes monitored, 
each gene is measured for 9 replications at time 0, and measured for 6 times at time 3, 6, 
12, 18, 24h. At each of these 5 points, 3 measurements were obtained for treatment group 
and 3 measurements were obtained for control group. We first proceed with standard 
gene expression analysis. For each gene, we perform an ANOVA test on the treatment 
effect while aknowledging the time effet. We rank the F statistic for each gene and select 
the top 200 genes who have the most significant changes in expression between treatment 
and control group. Our goal is to study the networks of these 200 genes and also compare 
the network of the 200 genes between the treatment and control. We perform clustering 
analysis on the selected 200 genes, where the genes clustered together can be viewed as a 
group of genes who share similar expression profiles. This imposes symmetry constraints 
to the graphical modelling. We cluster these top 200 genes into 10 clusters based on 
K-means method. Therefore, there are in total of 55 edge classes and 10 vertex classes to 
be estimated based on a 200 by 200 data matrices. We perform penalized estimation and 
compare the result of the estimated edges between the treatment versus control. The es- 
timated between-cluster edges are provided in Figure 1. It is observed that although most 
between-cluster interactions are small, there are a few edges with large values indicating 
strong interactions. It is also observed that the edge values obtained from the treatment 
group and the control group are mostly comparable and only a few edges exhibit big 
differences. For instance, edges between cluster 1 and 5 and between cluster 4 and 6 have 
big differences in treatment group versus control group. These findings are worth further 
biological investigation to unveil the physical mechanism underlying the networks. 
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6. Conclusion 

When there are both sparsity and symmetry constrainsts on the graphical model, the 
penalized composite likelihood formulation based on conditional distributions offers an 
alternative way to perform the estimation and model selection. The estimation avoids the 
inversion of large matrices. It is shown that the proposed penalized composite likelihood 
estimator will threshold the estimate for zero parameters to zero with probability tending 
to one and the asymptotic distribution of the estimates for non-zero parameters follow 
the multivariate normal distribution as if we know the true submodel containing only 
non-zero parameters. 
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Appendix 

• The detailed derivation of the first derivatives with respect to 9 Es under RCON 

model is as follows: 

dQ{6) 

^ 1 dWXj + XOja^Wl , , 
= E^ d0E J ll2 +ns g n(6 Es ) 

j=i 



j'=i 

= j^(X ] +X~e ] a"f{ J2 *i)+nsgn(0 B .) 

i =1 i;(i,j)eE 3 

=E( X J( E ^)+^'( E X * T ( E E xm))+™«*(0e.) 

3=1 i;(i,j)£E s i;(i,j)eE s l;(l,j)eE s l;(l,j)$E a 

P / V 



i =1 i;(i,j)eE s i;(i,j)eE a S'=i i-,(i,j)eE 3 

°" E E ^i^-)+nsgn(fe.). 



(i,j)e£ s l;(l,ME s 

(12) 

• The detailed derivation of the first derivatives with respect to 9 Vm under RCON 

model is as follows: 

dQ(6) 

= 2 E ^7 + ^~ (13) 

=™ Vf- + 

jev m v y 

where dj = xjxj/n, and = Y%=i Y7v=i Qi'QiA'j- 
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• The detailed derivation of the first derivatives with respect to p Es under RCOR 
model is as follows: 

dQ(p,cr) 
dp Es 

v . , __ ,t a ~ (14) 

= E -frA °n /2 )^ ~ X, X(-^- a D 1/2 ) + nsgn(^). 

Note that (pj o^ 2 ) = (cr -1 / 2 /?)^-], the jth column of the matrix. Also we have 

the vector ©cr^ 2 = (<j~ 1 / 2 T Es )^ the jth column of the matrix. Furthermore, 

p = J2 S ' Pe s ,T Es '- This leads to: 

dQ(p,(?) 
dpE s 

= j^PE^T^X^Xia^T^ + (a-^p T E \ j} X T X (a'^T^ 
j=i 

~ -^=XjX(a-^T E °) l3]+ nsgn(9 Ea ) 
V (J 33 

=np E ^i({a- l l 2 T E °) T C{a- l l 2 T E ^ + ntr [{a'^p T E °) T C{<y- l l 2 T E ^ 
- ti(^Xa- l / 2 ) T X(a-^ 2 T E ^ +nsga(9 Es ). 

(15) 
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Table 1: Comparison of likelihood, composite likelihood, moment estimates on "math" 
dataset 



parameter 


est 


std 


est 


std 


est 


std 




likelihood 




composite 




moment 




vccl 


U-UiOl 


u.uuo / 


u.uuuo 


u.uuuo 


U.UUO 1 


u.uuuo 


vcc2 


0.0059 


0.0006 


0.0074 


0.0006 


0.0098 


0.0013 


vcc3 


0.0100 


0.0009 


0.0176 


0.0020 


0.0182 


0.0029 


eccl 


-0.0080 


0.0015 


-0.0062 


0.0009 


-0.0068 


0.0019 


ecc2 


-0.0018 


0.0007 


-0.0008 


0.0005 


-0.0021 


0.0008 


ecc3 


-0.0030 


0.0004 


-0.0027 


0.0002 


-0.0019 


0.0006 


ecc4 


-0.0047 


0.0008 


-0.0051 


0.0005 


-0.0055 


0.0012 



Table 2: Comparison of \\9 — 9\\2 from composite likelihood and moment estimates on 
simulated large dataset for RCON model 



n 


P 


comp 


moment 


#true edges 


250 


40 


0.2002 


2.3671 


256.7475 






(0.0757) 


(0.4580) 


(15.5644) 


250 


60 


0.1109 


5.6270 


590.4040 






(0.0367) 


(0.7201) 


(23.5606) 


250 


100 


0.0509 


23.7040 


1647.0707 






(0.0155) 


(2.0364) 


(34.7461) 


500 


40 


0.0901 


0.5482 


256.7475 






(0.0272) 


(0.1439) 


(15.5644) 


500 


60 


0.0588 


1.0924 


590.4040 






(0.0177) 


(0.1728) 


(23.5606) 


500 


100 


0.0252 


3.3530 


1647.0707 






(0.0098) 


(0.2781) 


(34.7461) 


1000 


40 


0.0467 


0.1548 


256.7475 






(0.0160) 


(0.0444) 


(15.5644) 


1000 


60 


0.0282 


0.2596 


590.4040 






(0.0090) 


(0.0491) 


(23.5606) 


1000 


100 


0.0125 


0.6686 


1647.0707 






(0.0037) 


(0.0684) 


(34.7461) 
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Table 3: Comparison of the composite likelihood and moment estimates on simulated 
large dataset for RCOR model 



11 


P 


comp 


moment 


comp 


moment 


#truc edges 






Hp- po)\h 


\\p- Po)\h 


i / '1 1 1 9 . . 

\\° 1/2 -°l' 1 la 


i/o 1 / 9 . . 

\\° 1/2 -°l' \h 




250 


40 


0.0317 


0.0350 


2.3869 


2.2941 


206.3200 






(0.0043) 


( 0.0050) 


(0.0185) 


(0.0179) 


(13.5011) 


250 


60 


0.0196 


0.0231 


2.3886 


2.2447 


474.0400 






(0.0023) 


( 0.0029) 


(0.0146) 


(0.0149) 


(22.0247) 


250 


100 


0.0097 


0.0140 


2.3905 


2.1449 


1316.9200 






(0.0015) 


( 0.0019) 


(0.0118) 


(0.0126) 


(33.0795) 


500 


40 


0.0317 


0.0350 


0.9881 


0.9226 


206.3200 






(0.0043) 


( 0.0050) 


(0.0131) 


(0.0126) 


(13.5011) 


500 


60 


0.0196 


0.0231 


0.9891 


0.8874 


474.0400 






(0.0023) 


( 0.0029) 


(0.0103) 


(0.0106) 


(22.0247) 


500 


100 


0.0097 


0.0140 


0.9903 


0.8167 


1316.9200 






(0.0015) 


( 0.0019) 


(0.0083) 


(0.0089) 


(33.0795) 


1000 


40 


0.0317 


0.0350 


0.0375 


0.0615 


206.3200 






(0.0043) 


( 0.0050) 


(0.0062) 


(0.0076) 


(13.5011) 


1000 


60 


0.0196 


0.0231 


0.0301 


0.0794 


474.0400 






(0.0023) 


( 0.0029) 


(0.0046) 


(0.0071) 


(22.0247) 


1000 


100 


0.0097 


0.0140 


0.0221 


0.1255 


1316.9200 






(0.0015) 


( 0.0019) 


(0.0034) 


(0.0063) 


(33.0795) 



Table 4: Model selection performance of penalized composite likelihood based on 100 
simulated datasets under each setting 



n 


P 


#zero edge 


#true edges 


fn 


fp 


tuning parameter 


250 


40 


651.6300 


120.5500 


27.8200 


0.0000 


1.2770 






7.7429 


12.9008 


(10.2152) 


(0.0000) 


(0.3194) 


250 


60 


1469.2300 


323.0300 


2.3000 


5.4400 


1.4985 






19.5349 


16.4890 


(11.4111) 


(19.2518) 


(0.2514) 


500 


40 


651.6300 


121.4700 


26.9000 


0.0000 


1.2650 






7.7429 


13.2432 


(10.6520) 


(0.0000) 


(0.3705) 


500 


60 


1474.0900 


325.3300 


0.0000 


0.5800 


1.0910 






13.6929 


11.7903 


(0.0000) 


(5.8000) 


(0.1961) 
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Figure 1: Estimated between-cluster edges for treatment and control groups 
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Numbers in parenthesis indicate the cluster IDs, followed by the estimated 6e s 
for the control and treatment groups. 
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